AITopics | policy and value network

Collaborating Authors

policy and value network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning

Neural Information Processing SystemsDec-25-2025, 15:35:39 GMT

Despite the success of practical solvers in various NP-complete domains such as SAT and CSP as well as using deep reinforcement learning to tackle two-player games such as Go, certain classes of PSPACE-hard planning problems have remained out of reach. Even carefully designed domain-specialized solvers can fail quickly due to the exponential search space on hard instances. Recent works that combine traditional search methods, such as best-first search and Monte Carlo tree search, with Deep Neural Networks' (DNN) heuristics have shown promising progress and can solve a significant number of hard planning instances beyond specialized solvers. To better understand why these approaches work, we studied the interplay of the policy and value networks of DNN-based best-first search on Sokoban and show the surprising effectiveness of the policy network, further enhanced by the value network, as a guiding heuristic for the search. To further understand the phenomena, we studied the cost distribution of the search algorithms and found that Sokoban instances can have heavy-tailed runtime distributions, with tails both on the left and right-hand sides. In particular, for the first time, we show the existence of \textit{left heavy tails} and propose an abstract tree model that can empirically explain the appearance of these tails. The experiments show the critical role of the policy network as a powerful heuristic guiding the search, which can lead to left heavy tails with polynomial scaling by avoiding exploring exponentially sized subtrees. Our results also demonstrate the importance of random restarts, as are widely used in traditional combinatorial solvers, for DNN-based search methods to avoid left and right heavy tails.

dnn-based best-first search, left heavy tail, policy and value network, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning

Neural Information Processing SystemsOct-10-2025, 23:45:42 GMT

However, these advancements are often restricted to a single deterministic environment with a narrow set of tasks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning

Neural Information Processing SystemsJan-19-2025, 05:31:17 GMT

dnn-based best-first search, policy and value network, sokoban planning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Which Heroes to Pick? Learning to Draft in MOBA Games with Neural Networks and Tree Search

Chen, Sheng, Zhu, Menghui, Ye, Deheng, Zhang, Weinan, Fu, Qiang, Yang, Wei

arXiv.org Artificial IntelligenceDec-18-2020

Hero drafting is essential in MOBA game playing as it builds the team of each side and directly affects the match outcome. State-of-the-art drafting methods fail to consider: 1) drafting efficiency when the hero pool is expanded; 2) the multi-round nature of a MOBA 5v5 match series, i.e., two teams play best-of-N and the same hero is only allowed to be drafted once throughout the series. In this paper, we formulate the drafting process as a multi-round combinatorial game and propose a novel drafting algorithm based on neural networks and Monte-Carlo tree search, named JueWuDraft. Specifically, we design a long-term value estimation mechanism to handle the best-of-N drafting case. Taking Honor of Kings, one of the most popular MOBA games at present, as a running case, we demonstrate the practicality and effectiveness of JueWuDraft when compared to state-of-the-art drafting methods.

dataset, juewudraft, value network, (15 more...)

arXiv.org Artificial Intelligence

2012.10171

Country:

North America > United States (0.14)
Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Leisure & Entertainment > Sports (0.93)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Loss-annealed GAIL for sample efficient and stable Imitation Learning

Jena, Rohit, Sycara, Katia

arXiv.org Machine LearningJan-21-2020

Imitation learning is the problem of learning a policy from an expert policy without access to a reward signal. Often, the expert policy is only available in the form of expert demonstrations. Behavior cloning and GAIL are two popularly used methods for performing imitation learning in this setting. Behavior cloning converges in a few training iterations, but doesn't reach peak performance and suffers from compounding errors due to its supervised training framework and iid assumption. GAIL attempts to tackle this problem by accounting for the temporal dependencies between states while matching occupancy measures of the expert and the policy. Although GAIL has shown successes in a number of environments, it takes a lot of environment interactions. Given their complementary benefits, existing methods have mentioned trying or tried to combine the two methods, without much success. We look at some of the limitations of existing ideas that try to combine BC and GAIL, and present an algorithm that combines the best of both worlds to enable faster and stable training while not compromising on performance. Our algorithm is embarrassingly simple to implement and seamlessly integrates with different policy gradient algorithms. We demonstrate the effectiveness of the algorithm both in low dimensional control tasks in a limited data setting, and in high dimensional grid world environments.

algorithm, discriminator, trajectory, (15 more...)

arXiv.org Machine Learning

2001.07798

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Sim-to-Real Transfer Learning using Robustified Controllers in Robotic Tasks involving Complex Dynamics

van Baar, Jeroen, Sullivan, Alan, Cordorel, Radu, Jha, Devesh, Romeres, Diego, Nikovski, Daniel

arXiv.org Machine LearningSep-12-2018

Learning robot tasks or controllers using deep reinforcement learning has been proven effective in simulations. Learning in simulation has several advantages. For example, one can fully control the simulated environment, including halting motions while performing computations. Another advantage when robots are involved, is that the amount of time a robot is occupied learning a task---rather than being productive---can be reduced by transferring the learned task to the real robot. Transfer learning requires some amount of fine-tuning on the real robot. For tasks which involve complex (non-linear) dynamics, the fine-tuning itself may take a substantial amount of time. In order to reduce the amount of fine-tuning we propose to learn robustified controllers in simulation. Robustified controllers are learned by exploiting the ability to change simulation parameters (both appearance and dynamics) for successive training episodes. An additional benefit for this approach is that it alleviates the precise determination of physics parameters for the simulator, which is a non-trivial task. We demonstrate our proposed approach on a real setup in which a robot aims to solve a maze puzzle, which involves complex dynamics due to static friction and potentially large accelerations. We show that the amount of fine-tuning in transfer learning for a robustified controller is substantially reduced compared to a non-robustified controller.

machine learning, reinforcement learning, simulation, (15 more...)

arXiv.org Machine Learning

1809.0472

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.83)

Add feedback